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The mutual fund industry manages about a quarter of the assets in the U.S. stock market and 
thus plays an important role in the U.S. economy. The question of how much control is concentrated 
in the hands of the largest players is best quantitatively discussed in terms of the tail behavior of 
the mutual fund size distribution. We study the distribution empirically and show that the tail is 
much better described by a log-normal than a power law, indicating less concentration than, for 
example, personal income. The results are highly statistically significant and are consistent across 
fifteen years. This contradicts a recent theory concerning the origin of the power law tails of the 
trading volume distribution. Based on the analysis in a companion paper, the log-normality is to be 
expected, and indicates that the distribution of mutual funds remains perpetually out of equilibrium. 

PACS numbers: 89.65.Gh,89.75.Da,02.60.Ed 
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I. INTRODUCTION 

As of 2007 the mutual fund industry controlled 23% of 
household taxable assets in the United Stateo In abso- 
lute terms this corresponded to 4.4 trillion USD and 24% 
of U.S. corporate equity holdings. Large players such as 
institutional investors are known to play an important 
role in the market [l[. This raises the question of who 
has this influence: Are mutual fund investments concen- 
trated in a few dominant large funds, or spread across 
many funds of similar size? Are there mutual funds that 
are so large that they are "too big to fail" ? 

This question is best addressed in terms of the behavior 
of the upper tail of the mutual fund size distribution. 
The two competing hypotheses usually made in studies 
of firms arc Zipf 's law vs. a lognormal. Zipf 's law means 
that the distribution of the size s is a power law with tail 
exponent ( s «l, i.e. 
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Log- normality means that log s has a normal distribu- 
tion, i.e. the density function pln(s) obeys 
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From the point of view of extreme value theory this dis- 
tinction is critical, since it implies a completely different 
class of tail behavioo These are both heavy tailed, but 



Data is taken from the Investment Company Institute's 2007 
fact book available at www.ici.org. 

According to extreme value theory a probability distribution 
can have only four possible types of tail behavior. The first 
three correspond to distributions with finite support, thin tails, 
and tails that are sufficiently heavy that some of the moments 
do not exist, i.e. power laws. The fourth category corresponds 
to distributions that in a certain sense do not converge; it is 
remarkable that most known distributions fall into one of the 
first three categories 0]. 



Zipf's law is much more heavy tailed. For a log-normal 
all the moments exist, whereas for Zipf's law none of the 
moments exist. For Zipf's law an estimator of the mean 
fails to converge. In practical terms, for mutual funds 
this would imply that for any sample size N, with sig- 
nificant probability an individual fund can be so large 
that it is bigger than all other N — 1 firms combined. In 
contrast, for a log-normal, in the limit as N — > oo the 
relative size of a single fund becomes negligible. 

This question takes on added meaning because the as- 
sumption that mutual funds follow Zipf's law has been 
argued to be responsible for the observed power law dis- 
tribution of trading volume [3, |4J|. Gabaix et al. have 
also asserted that the mutual fund distribution follows 
Zipf's law and have used this in a proposed explanation 
for the distribution of price returns [5|, |6| . 

We resolve this empirically using the Center for Re- 
search in Security Prices (CRSP) dataset and find that 
the equity fund size distribution is much better described 
by a log-normal distribution. 

Our results are interesting in the broader context of the 
literature on firm size. Mutual funds provide a particu- 
larly good type of firm to study because there are a large 
number of funds and their size is accurately recorded. It 
is generally believed that the resulting size distribution 
from aggregating across industries has a power law tail 
that roughly follows Zipf's law, but for individual indus- 
tries the tail behavior is debatedj A large number of 
stochastic process models have been proposed to explain 
thiiQ. Our results add support to the notion that for 
single industries the distribution is log-normal. 

The log-normality of the distribution of mutual funds 
is also interesting for what it suggests about the under- 
lying processes that determine mutual fund size. In a 



Some studies have found that the upper tail is a log-normal 
|7l 4l3ll while others have found a power law Il2l4l4| 
4 For past stochastic models see [g, 0, 0> I15H191 




FIG. 1. The CDF for the mutual fund size s (in millions of 
2007 dollars) is plotted with a double logarithmic scale. The 
cumulative distribution for funds existing at the end of the 
years 1993, 1998 and 2005 are given by the full, dashed and 
dotted lines respectively. 

Inset: The upper tail of the CDF for the mutual funds existing 
at the end of 1998 (dotted line) is compared to an algebraic 
relation with exponent —1 (solid line). 



companion paper [2l| we develop a model for the random 
process of mutual fund entry, exit and growth under the 
assumption of market efficiency, and show that this gives 
a good fit to the data studied here. We show that while 
the steady-state solution is a power law, the timescale 
for reaching this solution is very slow. Thus given any 
substantial non-stationarity in the entry and exit pro- 
cesses the distribution will remain in its non-equilibrium 
log-normal state. See the discussion in Section V. 



III. IS THE TAIL A POWER LAW? 

Despite the fact that the mutual fund industry offers a 
large quantity of well-recorded data, the size distribution 
of mutual funds has not been rigorously studied. This is 
in contrast with other types of firms where the size distri- 
bution has long been an active research subject. The fact 
that the distribution is highly skewed and heavy tailed 
can be seen in Figure [IJ where we plot the cumulative 
distribution of sizes P(s > X) of mutual fund sizes in 
three different years. 

A visual inspection of the mutual fund size distribution 
suggests that it does not follow Zipf 's lawQ. In the inset of 
Figured] we compare the tail for funds with sizes s > 10 2 
million to a power law s _< » s , with £ s = — 1. Whereas a 
power law corresponds to a straight line when plotted on 
double logarithmic scale, the data show substantial and 
consistent downward curvature. The main point of this 
paper is to make more rigorous tests of the power law vs. 
the log-normal hypothesis. These back up the intuitive 
impression given by this plot, indicating that the data 
are not well described by a power law. 

To test the validity of the power law hypothesis we 
use the method developed by Clauset et al. [2(J. They 
use the somewhat strict definition^ that the probability 
density function p(s) is a power law if there exists an 
Smin such that for sizes larger than s m i n , the functional 
form of the density p(s) can be written 
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II. DATA SET 

We analyze the Center for Research in Security Prices 
(CRSP) Survivor-Bias-Free US Mutual Fund Database^. 
The database is survivor bias free as it contains historical 
performance data for both active and inactive mutual 
funds. We study monthly data from 1991 to 2OOE0 on 
all reported equity funds. We define an equity fund as 
one whose portfolio consists of at least 80% stocks. The 
results are not qualitatively sensitive to this, e.g. we get 
essentially the same results even if we use all funds. The 
data set has monthly values for the Total Assets Managed 
(TASM) by the fund and the Net Asset Value (NAV). 
We define the size s of a fund to be the value of the 
TASM, measured in millions of US dollars and corrected 
for inflation relative to July 2007. Inflation adjustments 
are based on the Consumer Price Index, published by the 
BLS. 



5 The US Mutual Fund Database can be purchased from the Cen- 
ter for Research in Security Prices (www.crsp.com). 

6 There is data on mutual funds starting in 1961, but prior to 
1991 there are very few entries. There is a sharp increase in 
1991, suggesting incomplete data collection prior to 1991. 



where the distribution is normalized in the interval 
[s m i„, oo). There are two free parameters s m in and £ s . 
This crossover size s m j n is chosen such that it minimizes 
the Kolmogorov-Smirnov (KS) statistic D, which is the 
distance between the CDF of the empirical data P e (s) 
and that of the fitted model P/(s), i.e. 

D= max \P e (s)-Pf(s)\. 

Using this procedure we estimate C, s and s m j n for the 
years 1991- 2005 as shown in Table HI The values of 
£ s computed in each year range from 0.78 to 1.36 and 
average £ s = 1.09 ± 0.04. If indeed these are power laws 
this is consistent with Zipf's law. But of course, merely 



7 Previous work on the size distribution of mutual funds by Gabaix 
et al. 5, 6, 19] argued for a power law while we argue here for a 
log-normal. 

8 In extreme value theory a power law is defined as any function 
that in the limit s — > oo can be written p(s) = g(s)s~^ 3 + 1 > 
where g(s) is a slowly varying function. This means it satisfies 
lims—^oo g(ts)/g(s) = C for any t > 0, where C is a positive 
constant. The test for power laws in reference 12011 is too strong 
in the sense that it assumes that there exists an so such that for 
s > so, g(s) is constant. 



computing an exponent and getting a low value does not 
mean that the distribution is actually a power law. 

To test the power law hypothesis more rigorously we 
follow the Monte Carlo method utilized by Clauset et 
al. Assuming independence, for each year we generate 
10, 000 synthetic data sets, each drawn from a power law 
with the empirically measured values of s m i n and Cs ■ For 
each data-set we calculate the KS statistic to its best 
fit. The p- value is the fraction of the data sets for which 
the KS statistic to its own best fit is larger than the KS 
statistic for the empirical data and its best fit. 

The results are summarized in Table HI The power law 
hypothesis is rejected with two standard deviations or 
more in six of the years and rejected at one standard de- 
viation or more in twelve of the years (there are fifteen 
in total). Furthermore there is a general pattern that as 
time progresses the rejection of the hypothesis becomes 
stronger. We suspect that this is because of the increase 
in the number of equity funds. As can be seen in Ta- 
ble HI the total number of equity funds increases roughly 
linearly in time, and the number in the upper tail N ta u 
also increases. 

We conclude that the power law tail hypothesis is ques- 
tionable but cannot be unequivocally rejected in every 
year. Stronger evidence against it comes from compari- 
son to a log-normal, as done in the next section. 
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FIG. 2. A Quantile-Quantile (QQ) plot for the upper tail 
of the size distribution of equity funds. The quantiles are the 
base ten logarithm of the fund size, in millions of dollars. The 
empirical quantiles are calculated from the size distribution 
of funds existing at the end of the year 1998. The empirical 
data were truncated from below such that only funds with size 
s > s m i n were included in the calculation of the quantiles. (a) 
A QQ-plot with the empirical quantiles as the x-axis and the 
quantiles for the best fit power law as the y-axis. The power 
law fit for the data was done using the maximum likelihood 
described in Section Hm yielding s m in = 1945 and a — 1.107. 
(b) A QQ-plot with the empirical quantiles as the x-axis and 
the quantiles for the best fit log-normal as the y-axis, with 
the same s m in as in (a). The log-normal fit for the data was 
done used the maximum likelihood estimation given s m i n l|2)l 
yielding /z = 2.34 and a = 2.5. 



IV. IS THE TAIL LOG-NORMAL? 

A visual comparison between the two hypotheses can 
be made by looking at the Quantile Quantile (QQ) plots 
for the empirical data compared to each of the two hy- 
potheses. In a QQ-plot we plot the quantiles of one distri- 
bution as the x-axis and the other's as the y-axis. If the 
two distributions are the same then we expect the points 
to fall on a straight line. Figure [2] compares the two hy- 
potheses, making it clear that the log-normal is a much 
better fit than the power law. For the log-normal QQ 
plot most of the large values in the distribution fall on 
the dashed line corresponding to a log-normal distribu- 
tion, though the very largest values are somewhat above 
the dashed line. This says that the empirical distribu- 
tion decays slightly faster than a log-normal. There are 
two possible interpretations of this result: Either this is a 
statistical fluctuation or the true distribution really has 
slightly thinner tails than a log-normal. In any case, since 
a log-normal decays faster than a power law, it strongly 
suggests that the power law hypothesis is incorrect and 
the log-normal distribution is a better approximation. 

A more quantitative method to address the question of 
which hypothesis better describes the data is to com par e 
the likelihood of the observation in both hypotheses [2fJ . 
We define the likelihood for the tail of the distribution to 
be 
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We define the power law likelihood as 
LpL = Y[ s >s Ppl(sj) with the probability den- 
sity of the power law tail given by ([1]). The lognormal 
likelihood is defined as Lln — FJ s >s Pln{sj) with 
the probability density of the lognormal tail given by 
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The more probable that the empirical sample is drawn 
from a given distribution, the larger the likelihood for 
that set of observations. The ratio indicates which dis- 
tribution the data are more likely drawn from. We define 
the log likelihood ratio as 
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For each of the years 1991 to 2005 we computed the max- 
imum likelihood estimators for both the power law fit 
and the log-normal fit to the tail, as explained above and 
in Section Mil Using the fit parameters, the log like- 
lihood ratio was computed and the results are summa- 
rized graphically in Figure [3] and in Table HI The ratio 
is always negative, indicating that the likelihood for the 
log-normal hypothesis is greater than that of the power 
law hypothesis in every year. It seems clear that tails 
of the mutual fund data are much better described by a 
log-normal than by a power law. 
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TABLE I. Table of monthly parameter values for equity funds denned such that the portfolio contains a fraction of at least 80% stocks. The values for each of the 

monthly parameters (rows) were calculated for each year (columns). The mean and standard deviation are evaluated for the monthly values in each year. 

TZ - the base 10 log likelihood ratio of a power law fit relative to a log- normal fit as given by equation ((3|. A negative value of TZ indicates that the log-normal 

hypothesis is a likelier description than a power law. For all years the value is negative meaning that the log-normal distribution is more likely. 

N - the number of equity funds existing at the end of each year. 

E[co] - the mean log size of funds existing at the end of each year. 

Std[u>] - the standard deviation of log sizes for funds existing at the end of each year. 

E[s] - the mean size (in millions) of funds existing at the end of each year. 

Std[s] - the standard deviation of sizes (in billions) for funds existing at the end of each year. 

( a - the power law tail exponent JT]). 

Smin - the lower tail cutoff (in millions of dollars) above which we fit a power law JT]). 

Ntaii - the number of equity funds belonging to the upper tail s.t. s > s m i„. 

p-value - the probability of obtaining a goodness of fit at least as bad as the one calculated for the empirical data, under the null hypothesis of a power law upper tail. 
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FIG. 3. A histogram of the base 10 log likelihood ratios 1Z 
computed using ((3| for each of the years 1991 to 2005. A 
negative log likelihood ratio implies that it is more likely that 
the empirical distribution is log-normal then a power law. The 
log likelihood ratio is negative in every year, in several cases 
strongly so. 



V. IMPLICATIONS OF LOG-NORMALITY 

The log-normal nature of the size distribution has im- 
portant implications on the role investor behavior plays 
in the mutual fund industry. Is the size distribution of 
mutual funds, i.e. the concentration of assets, determined 
through investor choice or is it just a consequence of the 
random nature of the market? In a companion paper [2l| 
we propose that the size distribution can be explained 
by a simple random process model. This model, char- 
acterizing the entry, exit and growth of mutual funds as 
a random process, is based on market efficiency, which 
dictates that fund performance is size independent and 
fund growth is essentially random. This model provides 
a good explanation of the concentration of assets, sug- 
gesting that other effects, such as transaction costs or 
the behavioral aspects of investor choice, play a smaller 
role. 

The fact that the fund distribution is a log-normal is 
interesting because, as we argue in the companion paper, 



this indicates a very slow convergence toward equilib- 
rium. There we find a time-dependent solution for the 
underlying random process of mutual fund entry, exit, 
and growth, and show that the size distribution evolves 
from a log-normal towards a Zipf power law distribution. 
However, the relaxation to the steady-state solution is ex- 
tremely slow, with time scales on the order of a century or 
more. Given that the mutual fund industry is still young, 
the distribution remains in its non-equilibrium state as 
a log-normal. Furthermore, given that the properties of 
the entry and exit processes are not stable over long pe- 
riods of time, the non-equilibrium log-normal state will 
very likely persist indefinitely. 

VI. CONCLUSIONS 

We have shown in unequivocal terms that the mutual 
fund size distribution is much closer to a log-normal than 
to a power law. Thus, while the distribution is concen- 
trated, it is not nearly as concentrated as it might be. 
Among other things this suggests that that the power law 
distribution observed for trading volume by Gopikrish- 
nan et al. [22| cannot be explained based on a power law 
distribution for funds. The companion paper discussed 
in the previous section [21] constructs a theory that ex- 
plains the log-normality based on the random nature of 
the mutual fund entry, exit and growth, and the very 
long-time scales required for convergence to the steady- 
state power law solution. 
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